Marginality: A Numerical Mapping for Enhanced Exploitation of Taxonomic Attributes
نویسنده
چکیده
Hierarchical attributes appear in taxonomic or ontologybased data (e.g. NACE economic activities, ICD-classified diseases, animal/plant species, etc.). Such taxonomic data are often exploited as if they were flat nominal data without hierarchy, which implies losing substantial information and analytical power. We introduce marginality, a numerical mapping for taxonomic data that allows using on those data many of the algorithms and analytical techniques designed for numerical data. We show how to compute descriptive statistics like the mean, the variance and the covariance on marginality-mapped data. Also, we define a mathematical distance between records including hierarchical attributes that is based on marginality-based variances. Such a distance paves the way to re-using on taxonomic data clustering and anonymization techniques designed for numerical data.
منابع مشابه
Anonymization Methods for Taxonomic Microdata
Often microdata sets contain attributes which are neither numerical nor ordinal, but take nominal values from a taxonomy, ontology or classification (e.g. diagnosis in a medical data set about patients, economic activity in an economic data set, etc.). Such data sets must be anonymized if transferred outside the data collector’s premises (e.g. hospital or national statistical office), say, for ...
متن کاملMarginality: a numerical mapping for enhanced treatment of nominal and hierarchical attributes
The purpose of statistical disclosure control (SDC) of microdata, a.k.a. data anonymization or privacy-preserving data mining, is to publish data sets containing the answers of individual respondents in such a way that the respondents corresponding to the released records cannot be re-identified and the released data are analytically useful. SDC methods are either based on masking the original ...
متن کاملAnonymization of nominal data based on semantic marginality
Nominal attributes are very common in data sets about individuals, specifically medical data like patient healthcare records. Attributes of this type tend to be sensitive due to their personal nature. If public-use data sets need to be released, e.g. for clinical research purposes, data should be first anonymized. However, since most anonymization methods omit data semantics when dealing with n...
متن کاملElite Opposition-based Artificial Bee Colony Algorithm for Global Optimization
Numerous problems in engineering and science can be converted into optimization problems. Artificial bee colony (ABC) algorithm is a newly developed stochastic optimization algorithm and has been widely used in many areas. However, due to the stochastic characteristics of its solution search equation, the traditional ABC algorithm often suffers from poor exploitation. Aiming at this weakness o...
متن کاملNUMERICAL TAXONOMIC STUDY OF THE IRANIAN SPECIES OF ALYSSUM L. BASED ON MORPHOLOGICAL CHARACTERS
The genus Alyssum L. belongs to the subtribe Alyssinae, tribe Alysseae and family Cruciferae (Brassicaceae). This genus is one of the largest genera of the family of Cruciferae in Iran, and seems to be the most problematic genus in which the boundary of certain species is not completely clear due to the polymorphism of morphological characters. The main objective of this research is to stud...
متن کامل